Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes

نویسندگان

  • Anoop Gupta
  • Wolf-Dietrich Weber
  • Todd C. Mowry
چکیده

As multiprocessors are scaled beyond single bus systems, there is renewed interest in directory-based cache coherence schemes. These schemes rely on a directory to keep track of all processors caching a memory block. When a write to that block occurs, pointto-point invalidation messages are sent to keep the caches coherent. A straightforward way of recording the identities of processors caching a memory block is to use a bit vector per memory block, with one bit per processor. Unfortunately, when the main memory grows linearly with the number of processors, the total size of the directory memory grows as the square of the number of processors, which is prohibitive for large machines. To remedy this problem several schemes that use a limited number of pointers per directory entry have been suggested. These schemes often cause excessive invalidation traffic. In this paper, we propose two simple techniques that significantly reduce invalidation traffic and directory memory requirements. First, we present the coarse vector as a novel way of keeping directory state information. This scheme uses as little memory as other limited pointer schemes, but causes significantly less invalidation traffic. Second, we propose sparse directories, where one directory entry is associated with several memory blocks, as a technique for greatly reducing directory memory requirements. The paper presents an evaluation of the proposed techniques in the context of the Stanford DASH multiprocessor architecture. Results indicate that sparse directories coupled with coarse vectors can save one to two orders of magnitude in storage, with only a slight degradation in performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes

acknowledgement messages sent in response to these invalida-tions. The operation is complete when all acknowledgements have been received. Another hardware issue concerns synchronization. In DASH, the directory bit vectors are also used to keep track of processors queued for a lock. In the case of the full bit vector we have enough space to keep track of all nodes. Consequently, when a lock is ...

متن کامل

Eecient Implementation of Cache Coherence in Scalable Shared Memory Multiprocessors

The cache coherence scheme for a scalable distributed shared memory multiproces-sor should be eecient in terms of memory overhead for maintaining the directories, as well as network latency for a memory request. In this paper, we propose a cache coherence scheme which minimizes the memory access delay and at the same time, reduces the directory overhead by using a limited directory scheme. In t...

متن کامل

Dynamic Pointer Allocation for Scalable Cache Coherence Directories

The efficient implementation of cache consistency is one of the primary challenges in building shared memory multiprocessors with hundreds or thousands of processors. While directory-based coherency schemes are promising because they rely on point-to-point messages rather than a network broadcast mechanism, traditional directory organizations would use a prohibitive amount of memory in a large-...

متن کامل

A scalable organization for distributed directories

Although directory-based cache-coherence protocols are the best choice when designing chip multiprocessors with tens of cores on-chip, the memory overhead introduced by the directory structure may not scale gracefully with the number of cores. Many approaches aimed at improving the scalability of directories have been proposed. However, they do not bring perfect scalability and usually reduce t...

متن کامل

An Efficient Hybrid Cache Coherence Protocol for Shared Memory Multiprocessors

{ This paper presents a new tree-based cache coherence protocol which is a hybrid of the limited directory and the linked list schemes. By utilizing a limited number of pointers in the directory, the proposed protocol connects the nodes caching a shared block in a tree fashion. In addition to the low communication overhead, the proposed scheme also contains the advantages of the existing bit-ma...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1990